AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsFeb-18-2026, 05:42:47 GMT

d037fd021c9aace128b8ce25001cdb6c-Paper-Conference.pdf

large language model, machine learning, natural language, (18 more...)

Country:

North America > United States > California > Los Angeles County > Glendale (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceNov-4-2025

A Framework Based on Graph Cellular Automata for Similarity Evaluation in Urban Spatial Networks

Wu, Peiru, Zhai, Maojun, Zhang, Lingzhu

Measuring similarity in urban spatial networks is key to understanding cities as complex systems. Yet most existing methods are not tailored for spatial networks and struggle to differentiate them effectively. We propose GCA-Sim, a similarity-evaluation framework based on graph cellular automata. Each submodel measures similarity by the divergence between value distributions recorded at multiple stages of an information evolution process. We find that some propagation rules magnify differences among network signals; we call this "network resonance." With an improved differentiable logic-gate network, we learn several submodels that induce network resonance. We evaluate similarity through clustering performance on fifty city-level and fifty district-level road networks. The submodels in this framework outperform existing methods, with Silhouette scores above 0.9. Using the best submodel, we further observe that planning-led street networks are less internally homogeneous than organically grown ones; morphological categories from different domains contribute with comparable importance; and degree, as a basic topological signal, becomes increasingly aligned with land value and related variables over iterations.

artificial intelligence, data mining, machine learning, (17 more...)

2511.00768

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (0.49)
Transportation > Ground > Road (0.49)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

arXiv.org Artificial IntelligenceOct-27-2025

Human and AI Trust: Trust Attitude Measurement Instrument

Larasati, Retno

With the current progress of Artificial Intelligence (AI) technology and its increasingly broader applications, trust is seen as a required criterion for AI usage, acceptance, and deployment. A robust measurement instrument is essential to correctly evaluate trust from a human-centered perspective. This paper describes the development and validation process of a trust measure instrument, which follows psychometric principles, and consists of a 16-items trust scale. The instrument was built explicitly for research in human-AI interaction to measure trust attitudes towards AI systems from layperson (non-expert) perspective. The use-case we used to develop the scale was in the context of AI medical support systems (specifically cancer/health prediction). The scale development (Measurement Item Development) and validation (Measurement Item Evaluation) involved six research stages: item development, item evaluation, survey administration, test of dimensionality, test of reliability, and test of validity. The results of the six-stages evaluation show that the proposed trust measurement instrument is empirically reliable and valid for systematically measuring and comparing non-experts' trust in AI Medical Support Systems.

artificial intelligence, machine learning, validity, (17 more...)

2510.21535

Country:

North America > United States (0.67)
Europe > United Kingdom (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Education (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Neural Information Processing SystemsOct-10-2025, 17:10:45 GMT

Calibrating Reasoning in Language Models with Internal Consistency

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought prompting that elicits verbalized reasoning.

consistency, internal consistency, reasoning, (14 more...)

Country:

North America > United States > California > Los Angeles County > Glendale (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Aich, Agnideep, Aich, Ashit Baran, Murshed, Md Monzur, Hewage, Sameera, Wade, Bruce

Bag of Coins: A Statistical Probe into Neural Confidence Structures

arXiv.org Machine LearningJul-29-2025

Modern neural networks, despite their high accuracy, often produce poorly calibrated confidence scores, limiting their reliability in high-stakes applications. Existing calibration methods typically post-process model outputs without interrogating the internal consistency of the predictions themselves. In this work, we introduce a novel, non-parametric statistical probe, the Bag-of-Coins (BoC) test, that examines the internal consistency of a classifier's logits. The BoC test reframes confidence estimation as a frequentist hypothesis test: does the model's top-ranked class win 1-v-1 contests against random competitors at a rate consistent with its own stated softmax probability? When applied to modern deep learning architectures, this simple probe reveals a fundamental dichotomy. On Vision Transformers (ViTs), the BoC output serves as a state-of-the-art confidence score, achieving near-perfect calibration with an ECE of 0.0212, an 88% improvement over a temperature-scaled baseline. Conversely, on Convolutional Neural Networks (CNNs) like ResNet, the probe reveals a deep inconsistency between the model's predictions and its internal logit structure, a property missed by traditional metrics. We posit that BoC is not merely a calibration method, but a new diagnostic tool for understanding and exposing the differing ways that popular architectures represent uncertainty.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2507.19774

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-27-2025, 17:27:13 GMT

Calibrating Reasoning in Language Models with Internal Consistency

calibrating reasoning, internal consistency, internal representation, (4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceFeb-18-2025

Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Chen, Chaoran, Yao, Bingsheng, Zou, Ruishi, Hua, Wenyue, Lyu, Weimin, Li, Toby Jia-Jun, Wang, Dakuo

Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods.

large language model, machine learning, natural language, (20 more...)

2502.13012

Country: North America > United States > California (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Education (1.00)
Government (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

arXiv.org Artificial IntelligenceNov-18-2024

CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization

Min, Nay Myat, Pham, Long H., Li, Yige, Sun, Jun

Recent studies reveal that Large Language Models (LLMs) are susceptible to backdoor attacks, where adversaries embed hidden triggers that manipulate model responses. Existing backdoor defense methods are primarily designed for vision or classification tasks, and are thus ineffective for text generation tasks, leaving LLMs vulnerable. We introduce Internal Consistency Regularization (CROW), a novel defense using consistency regularization finetuning to address layer-wise inconsistencies caused by backdoor triggers. CROW leverages the intuition that clean models exhibit smooth, consistent transitions in hidden representations across layers, whereas backdoored models show noticeable fluctuation when triggered. By enforcing internal consistency through adversarial perturbations and regularization, CROW neutralizes backdoor effects without requiring clean reference models or prior trigger knowledge, relying only on a small set of clean data. This makes it practical for deployment across various LLM architectures. Experimental results demonstrate that CROW consistently achieves a significant reductions in attack success rates across diverse backdoor strategies and tasks, including negative sentiment, targeted refusal, and code injection, on models such as Llama-2 (7B, 13B), CodeLlama (7B, 13B) and Mistral-7B, while preserving the model's generative capabilities.

large language model, machine learning, natural language, (19 more...)

2411.12768

Country:

Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-18-2024

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Liang, Xun, Song, Shichao, Zheng, Zifan, Wang, Hanyu, Yu, Qingchen, Li, Xunkai, Li, Rong-Hua, Wang, Yi, Wang, Zhonghao, Xiong, Feiyu, Li, Zhiyu

Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations. To address these, studies prefixed with "Self-" such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating themselves. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization. In this paper, we use a unified perspective of internal consistency, offering explanations for reasoning deficiencies and hallucinations. Internal consistency refers to the consistency in expressions among LLMs' latent, decoding, or response layers based on sampling methodologies. Then, we introduce an effective theoretical framework capable of mining internal consistency, named Self-Feedback. This framework consists of two modules: Self-Evaluation and Self-Update. The former captures internal consistency signals, while the latter leverages the signals to enhance either the model's response or the model itself. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, "Does Self-Feedback Really Work?" We also propose several critical viewpoints, including the "Hourglass Evolution of Internal Consistency", "Consistency Is (Almost) Correctness" hypothesis, and "The Paradox of Latent and Explicit Reasoning". The relevant resources are open-sourced at https://github.com/IAAR-Shanghai/ICSFSurvey.

arxiv preprint arxiv, consistency, language model, (12 more...)

2407.14507

Country:

Asia > China > Shanghai > Shanghai (0.24)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Hong Kong (0.04)
(4 more...)

Genre:

Research Report (0.81)
Overview (0.67)

Industry: Education > Educational Setting > Higher Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)